{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Boosting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we will construct a boosting classifier with the `AdaBoost` algorithm and a boosting regressor with the `AdaBoost.R2` algorithm. These algorithms can use a variety of weak learners but we will use decision tree classifiers and regressors, constructed in {doc}`Chapter 5 </content/c5/concept>`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "remove-output"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "importing Jupyter notebook from classification_tree.ipynb\n"
     ]
    }
   ],
   "source": [
    "## Import decision trees\n",
    "import import_ipynb\n",
    "import classification_tree as ct;\n",
    "\n",
    "## Import numpy and visualization packages\n",
    "import numpy as np \n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from sklearn import datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Classification with AdaBoost"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following is a construction of the binary AdaBoost classifier introduced in the {doc}`concept section </content/c6/s1/boosting>`. Let's again use the {doc}`penguins </content/appendix/data>` dataset from `seaborn`, but rather than predicting the penguin's species (a multiclass problem), we'll predict whether the species is *Adelie* (a binary problem). The data is loaded and split into train vs. test with the hidden code cell below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "## Load data\n",
    "penguins = sns.load_dataset('penguins')\n",
    "penguins.dropna(inplace = True)\n",
    "X = np.array(penguins.drop(columns = ['species', 'island']))\n",
    "y = 1*np.array(penguins['species'] == 'Adelie')\n",
    "y[y == 0] = -1\n",
    "\n",
    "## Train-test split\n",
    "np.random.seed(123)\n",
    "test_frac = 0.25\n",
    "test_size = int(len(y)*test_frac)\n",
    "test_idxs = np.random.choice(np.arange(len(y)), test_size, replace = False)\n",
    "X_train = np.delete(X, test_idxs, 0)\n",
    "y_train = np.delete(y, test_idxs, 0)\n",
    "X_test = X[test_idxs]\n",
    "y_test = y[test_idxs]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Recall that AdaBoost fits *weighted* weak learners. Let's start by defining the weighted loss functions introduced in the {doc}`concept section </content/c6/s1/boosting>`. The helper function `get_weighted_pmk()` calculates \n",
    "\n",
    "$$\n",
    "\\hat{p}_{mk} = \\frac{\\sumN w_n I(\\bx_n \\in \\mathcal{N}_m)}{\\sumN w_n}\n",
    "$$\n",
    "\n",
    "for each class $k$. The `gini_index()` and `cross_entropy()` functions then call this function and return the appropriate loss. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Loss Functions\n",
    "def get_weighted_pmk(y, weights):\n",
    "    ks = np.unique(y)\n",
    "    weighted_pmk = [sum(weights[y == k]) for k in ks]      \n",
    "    return(np.array(weighted_pmk)/sum(weights))\n",
    "\n",
    "def gini_index(y, weights):\n",
    "    weighted_pmk = get_weighted_pmk(y, weights)\n",
    "    return np.sum( weighted_pmk*(1-weighted_pmk) )\n",
    "\n",
    "def cross_entropy(y, weights):\n",
    "    weighted_pmk = get_weighted_pmk(y, weights)    \n",
    "    return -np.sum(weighted_pmk*np.log2(weighted_pmk))\n",
    "\n",
    "def split_loss(child1, child2, weights1, weights2, loss = cross_entropy):\n",
    "    return (len(child1)*loss(child1, weights1) + len(child2)*loss(child2, weights2))/(len(child1) + len(child2))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to incorporate observation weights, we have to make slight adjustments to the `DecisionTreeClassifier` class. In the class we {doc}`previously constructed </content/c5/s2/classification_tree>`, the data from parent nodes was split and funneled anonymously to one of two child nodes. This alone will not allow us to incorporate weights. Instead, we need to also track the ID of each observation so we can track its weight. This is done with the `DecisionTreeClassifier` class defined in the hidden cell below, which is mostly a reconstruction of the class defined in Chapter 5. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "## Helper Classes\n",
    "class Node:\n",
    "    \n",
    "    def __init__(self, Xsub, ysub, observations, ID, depth = 0, parent_ID = None, leaf = True):\n",
    "        self.Xsub = Xsub\n",
    "        self.ysub = ysub\n",
    "        self.observations = observations\n",
    "        self.ID = ID\n",
    "        self.size = len(ysub)\n",
    "        self.depth = depth\n",
    "        self.parent_ID = parent_ID\n",
    "        self.leaf = leaf\n",
    "        \n",
    "\n",
    "class Splitter:\n",
    "    \n",
    "    def __init__(self):\n",
    "        self.loss = np.inf\n",
    "        self.no_split = True\n",
    "        \n",
    "    def _replace_split(self, loss, d, dtype = 'quant', t = None, L_values = None):\n",
    "        self.loss = loss\n",
    "        self.d = d\n",
    "        self.dtype = dtype\n",
    "        self.t = t\n",
    "        self.L_values = L_values  \n",
    "        self.no_split = False\n",
    "\n",
    "        \n",
    "## Main Class\n",
    "class DecisionTreeClassifier:\n",
    "    \n",
    "    #############################\n",
    "    ######## 1. TRAINING ########\n",
    "    #############################\n",
    "    \n",
    "    ######### FIT ##########\n",
    "    def fit(self, X, y, weights, loss_func = cross_entropy, max_depth = 100, min_size = 2, C = None):\n",
    "        \n",
    "        ## Add data\n",
    "        self.X = X\n",
    "        self.y = y\n",
    "        self.N, self.D = self.X.shape\n",
    "        dtypes = [np.array(list(self.X[:,d])).dtype for d in range(self.D)]\n",
    "        self.dtypes = ['quant' if (dtype == float or dtype == int) else 'cat' for dtype in dtypes]\n",
    "        self.weights = weights\n",
    "        \n",
    "        ## Add model parameters\n",
    "        self.loss_func = loss_func\n",
    "        self.max_depth = max_depth\n",
    "        self.min_size = min_size\n",
    "        self.C = C\n",
    "        \n",
    "        ## Initialize nodes\n",
    "        self.nodes_dict = {}\n",
    "        self.current_ID = 0\n",
    "        initial_node = Node(Xsub = X, ysub = y, observations = np.arange(self.N), ID = self.current_ID, parent_ID = None)\n",
    "        self.nodes_dict[self.current_ID] = initial_node\n",
    "        self.current_ID += 1\n",
    "        \n",
    "        # Build\n",
    "        self._build()\n",
    "\n",
    "    ###### BUILD TREE ######\n",
    "    def _build(self):\n",
    "        \n",
    "        eligible_buds = self.nodes_dict \n",
    "        for layer in range(self.max_depth):\n",
    "            \n",
    "            ## Find eligible nodes for layer iteration\n",
    "            eligible_buds = {ID:node for (ID, node) in self.nodes_dict.items() if \n",
    "                                (node.leaf == True) &\n",
    "                                (node.size >= self.min_size) & \n",
    "                                (~ct.all_rows_equal(node.Xsub)) &\n",
    "                                (len(np.unique(node.ysub)) > 1)}\n",
    "            if len(eligible_buds) == 0:\n",
    "                break\n",
    "            \n",
    "            ## split each eligible parent\n",
    "            for ID, bud in eligible_buds.items():\n",
    "                                \n",
    "                ## Find split\n",
    "                self._find_split(bud)\n",
    "                \n",
    "                ## Make split\n",
    "                if not self.splitter.no_split:\n",
    "                    self._make_split()\n",
    "                \n",
    "    ###### FIND SPLIT ######\n",
    "    def _find_split(self, bud):\n",
    "        \n",
    "        ## Instantiate splitter\n",
    "        splitter = Splitter()\n",
    "        splitter.bud_ID = bud.ID\n",
    "        \n",
    "        ## For each (eligible) predictor...\n",
    "        if self.C is None:\n",
    "            eligible_predictors = np.arange(self.D)\n",
    "        else:\n",
    "            eligible_predictors = np.random.choice(np.arange(self.D), self.C, replace = False)\n",
    "        for d in sorted(eligible_predictors):\n",
    "            Xsub_d = bud.Xsub[:,d]\n",
    "            dtype = self.dtypes[d]\n",
    "            if len(np.unique(Xsub_d)) == 1:\n",
    "                continue\n",
    "\n",
    "            ## For each value...\n",
    "            if dtype == 'quant':\n",
    "                for t in np.unique(Xsub_d)[:-1]:\n",
    "                    L_condition = Xsub_d <= t\n",
    "                    ysub_L = bud.ysub[L_condition]\n",
    "                    ysub_R = bud.ysub[~L_condition]\n",
    "                    weights_L = self.weights[bud.observations][L_condition]\n",
    "                    weights_R = self.weights[bud.observations][~L_condition]\n",
    "                    loss = split_loss(ysub_L, ysub_R,\n",
    "                                      weights_L, weights_R,\n",
    "                                      loss = self.loss_func)\n",
    "                    if loss < splitter.loss:\n",
    "                        splitter._replace_split(loss, d, 'quant', t = t)\n",
    "            else:\n",
    "                for L_values in ct.possible_splits(np.unique(Xsub_d)):\n",
    "                    L_condition = np.isin(Xsub_d, L_values)\n",
    "                    ysub_L = bud.ysub[L_condition]\n",
    "                    ysub_R = bud.ysub[~L_condition]\n",
    "                    weights_L = self.weights[bud.observations][L_condition]\n",
    "                    weights_R = self.weights[bud.observations][~L_condition]\n",
    "                    loss = split_loss(ysub_L, ysub_R,\n",
    "                                      weights_L, weights_R,\n",
    "                                      loss = self.loss_func)\n",
    "                    if loss < splitter.loss: \n",
    "                        splitter._replace_split(loss, d, 'cat', L_values = L_values)\n",
    "                        \n",
    "        ## Save splitter\n",
    "        self.splitter = splitter\n",
    "    \n",
    "    ###### MAKE SPLIT ######\n",
    "    def _make_split(self):\n",
    "        \n",
    "        ## Update parent node\n",
    "        parent_node = self.nodes_dict[self.splitter.bud_ID]\n",
    "        parent_node.leaf = False\n",
    "        parent_node.child_L = self.current_ID\n",
    "        parent_node.child_R = self.current_ID + 1\n",
    "        parent_node.d = self.splitter.d\n",
    "        parent_node.dtype = self.splitter.dtype\n",
    "        parent_node.t = self.splitter.t        \n",
    "        parent_node.L_values = self.splitter.L_values\n",
    "        \n",
    "        ## Get X and y data for children\n",
    "        if parent_node.dtype == 'quant':\n",
    "            L_condition = parent_node.Xsub[:,parent_node.d] <= parent_node.t\n",
    "        else:\n",
    "            L_condition = np.isin(parent_node.Xsub[:,parent_node.d], parent_node.L_values)\n",
    "        Xchild_L = parent_node.Xsub[L_condition]\n",
    "        ychild_L = parent_node.ysub[L_condition]\n",
    "        child_observations_L = parent_node.observations[L_condition]\n",
    "        Xchild_R = parent_node.Xsub[~L_condition]\n",
    "        ychild_R = parent_node.ysub[~L_condition]\n",
    "        child_observations_R = parent_node.observations[~L_condition]\n",
    "        \n",
    "        ## Create child nodes\n",
    "        child_node_L = Node(Xchild_L, ychild_L, child_observations_L,\n",
    "                            ID = self.current_ID, depth = parent_node.depth + 1,\n",
    "                            parent_ID = parent_node.ID)\n",
    "        child_node_R = Node(Xchild_R, ychild_R, child_observations_R,\n",
    "                            ID = self.current_ID + 1, depth = parent_node.depth + 1,\n",
    "                            parent_ID = parent_node.ID)\n",
    "        self.nodes_dict[self.current_ID] = child_node_L\n",
    "        self.nodes_dict[self.current_ID + 1] = child_node_R\n",
    "        self.current_ID += 2\n",
    "                \n",
    "            \n",
    "    #############################\n",
    "    ####### 2. PREDICTING #######\n",
    "    #############################\n",
    "    \n",
    "    ###### LEAF MODES ######\n",
    "    def _get_leaf_modes(self):\n",
    "        self.leaf_modes = {}\n",
    "        for node_ID, node in self.nodes_dict.items():\n",
    "            if node.leaf:\n",
    "                values, counts = np.unique(node.ysub, return_counts=True)\n",
    "                self.leaf_modes[node_ID] = values[np.argmax(counts)]\n",
    "    \n",
    "    ####### PREDICT ########\n",
    "    def predict(self, X_test):\n",
    "        \n",
    "        # Calculate leaf modes\n",
    "        self._get_leaf_modes()\n",
    "        \n",
    "        yhat = []\n",
    "        for x in X_test:\n",
    "            node = self.nodes_dict[0] \n",
    "            while not node.leaf:\n",
    "                if node.dtype == 'quant':\n",
    "                    if x[node.d] <= node.t:\n",
    "                        node = self.nodes_dict[node.child_L]\n",
    "                    else:\n",
    "                        node = self.nodes_dict[node.child_R]\n",
    "                else:\n",
    "                    if x[node.d] in node.L_values:\n",
    "                        node = self.nodes_dict[node.child_L]\n",
    "                    else:\n",
    "                        node = self.nodes_dict[node.child_R]\n",
    "            yhat.append(self.leaf_modes[node.ID])\n",
    "        return np.array(yhat)\n",
    "            \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the weighted decision tree constructed, we are ready to build our `AdaBoost` class. The class closely follows the algorithm introduced in the content section, which is copied below for convenience.\n",
    "_____\n",
    "**Discrete AdaBoost Algorithm**\n",
    "\n",
    "Define the target variable to be $y_n \\in \\{-1, +1 \\}$.\n",
    "\n",
    "1. Initialize the weights with $w^1_n = \\frac{1}{N}$ for $n = 1, 2, \\dots, N$.\n",
    "\n",
    "2. For $t = 1, \\dots, T$,\n",
    "\n",
    "   - Build weak learner $t$ using weights $\\mathbf{w}^t$.\n",
    "\n",
    "   - Calculate fitted values $f^t(\\bx_n) \\in \\{-1, +1\\}$ for $n = 1, 2, \\dots, N$. Let $I^t_n$ equal 1 If $f^t(\\bx_n) \\neq y_n$ and 0 otherwise. That is, $I^t_n$ indicates whether learner $t$ misclassifies observation $n$.\n",
    "\n",
    "   - Calculate the weighted error for learner $t$:\n",
    "    \n",
    "     $$\n",
    "     \\epsilon^t = \\frac{\\sumN w^t_n I^t_n}{\\sumN w^t_n}.\n",
    "     $$\n",
    "   \n",
    "   - Calculate the accuracy measure for learner $t$:\n",
    "\n",
    "     $$\n",
    "     \\alpha^t = \\log\\left(\\frac{1-\\epsilon^t}{\\epsilon^t}\\right).\n",
    "     $$\n",
    "    \n",
    "   - Update the weighting with \n",
    "\n",
    "     $$\n",
    "     w^{t + 1}_n = w^t_n\\exp(\\alpha^tI^t_n),\n",
    "     $$\n",
    "    \n",
    "     for $n = 1, 2, \\dots, N$.\n",
    "\n",
    "3. Calculate the overall fitted values with $\\hat{y}_n = \\text{sign} \\left( \\sum_{t = 1}^T \\alpha^t f^t(\\bx_n)\\right)$.\n",
    "_____"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class AdaBoost:\n",
    "    \n",
    "    def fit(self, X_train, y_train, T, stub_depth = 1):\n",
    "        self.y_train = y_train\n",
    "        self.X_train = X_train\n",
    "        self.N, self.D = X_train.shape\n",
    "        self.T = T\n",
    "        self.stub_depth = stub_depth\n",
    "        \n",
    "        ## Instantiate stuff\n",
    "        self.weights = np.repeat(1/self.N, self.N)\n",
    "        self.trees = []\n",
    "        self.alphas = []\n",
    "        self.yhats = np.empty((self.N, self.T))\n",
    "        \n",
    "        for t in range(self.T):\n",
    "            \n",
    "            ## Calculate stuff\n",
    "            self.T_t = DecisionTreeClassifier()\n",
    "            self.T_t.fit(self.X_train, self.y_train, self.weights, max_depth = self.stub_depth)\n",
    "            self.yhat_t = self.T_t.predict(self.X_train)\n",
    "            self.epsilon_t = sum(self.weights*(self.yhat_t != self.y_train))/sum(self.weights)\n",
    "            self.alpha_t = np.log( (1-self.epsilon_t)/self.epsilon_t )\n",
    "            self.weights = np.array([w*(1-self.epsilon_t)/self.epsilon_t if self.yhat_t[i] != self.y_train[i]\n",
    "                                    else w for i, w in enumerate(self.weights)])\n",
    "            ## Append stuff\n",
    "            self.trees.append(self.T_t)\n",
    "            self.alphas.append(self.alpha_t)\n",
    "            self.yhats[:,t] = self.yhat_t \n",
    "            \n",
    "        self.yhat = np.sign(np.dot(self.yhats, self.alphas))\n",
    "        \n",
    "    def predict(self, X_test):\n",
    "        yhats = np.zeros(len(X_test))\n",
    "        for t, tree in enumerate(self.trees):\n",
    "            yhats_tree = tree.predict(X_test)\n",
    "            yhats += yhats_tree*self.alphas[t]\n",
    "        return np.sign(yhats)\n",
    "        \n",
    "        \n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `AdaBoost` model is finally fit below. To train the model, we supply the training data as well as `T`—the number of weak learners—and `stub_depth`—the depth for each tree (our weak learner). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9759036144578314"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "booster = AdaBoost()\n",
    "booster.fit(X_train, y_train, T = 30, stub_depth = 3)\n",
    "yhat = booster.predict(X_test)\n",
    "np.mean(yhat == y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Regression with AdaBoost.R2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's implement *AdaBoost.R2*, a common boosting algorithm for regression tasks. We'll again use the {doc}`tips </content/appendix/data>` dataset from `seaborn`, loaded in the hidden code cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "## Import packages\n",
    "import numpy as np \n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from sklearn import datasets\n",
    "\n",
    "## Load data\n",
    "tips = sns.load_dataset('tips')\n",
    "X = np.array(tips.drop(columns = 'tip'))\n",
    "y = np.array(tips['tip'])\n",
    "\n",
    "## Train-test split\n",
    "np.random.seed(1)\n",
    "test_frac = 0.25\n",
    "test_size = int(len(y)*test_frac)\n",
    "test_idxs = np.random.choice(np.arange(len(y)), test_size, replace = False)\n",
    "X_train = np.delete(X, test_idxs, 0)\n",
    "y_train = np.delete(y, test_idxs, 0)\n",
    "X_test = X[test_idxs]\n",
    "y_test = y[test_idxs]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since our boosting class will use regression trees for its weak learners, let's also import the regression tree we constructed in {doc}`Chapter 5 </content/c5/s2/regression_tree>`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": [
     "remove-output"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "importing Jupyter notebook from regression_tree.ipynb\n"
     ]
    }
   ],
   "source": [
    "## Import decision trees\n",
    "import import_ipynb\n",
    "import regression_tree as rt;\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Recall that the final fitted values in *AdaBoost.R2* are based on a weighted median. Let's first make a helper function to return the weighted median.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def weighted_median(values, weights):\n",
    "    \n",
    "    sorted_indices = values.argsort()\n",
    "    values = values[sorted_indices]\n",
    "    weights = weights[sorted_indices]\n",
    "    weights_cumulative_sum = weights.cumsum()\n",
    "    median_weight = np.argmax(weights_cumulative_sum >= sum(weights)/2)\n",
    "    return values[median_weight]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then fit the `AdaBoostR2` class. This again follows the algorithm closely, which is again copied below for convenience.\n",
    "____\n",
    "**AdaBoost.R2 Algorithm**\n",
    "\n",
    "1. Initialize the weights with $w^1_n = \\frac{1}{N}$ for $n = 1, 2, \\dots, N$.\n",
    "\n",
    "2. For $t = 1, 2, \\dots, T$ or while $\\bar{L}^t$, as defined below, is less than or equal to 0.5,\n",
    "\n",
    "   - Draw a sample of size $N$ from the training data with replacement and with probability $w^t_n$ for $n = 1, 2, \\dots, N$. \n",
    "   - Fit weak learner $t$ to the resampled data and calculate the fitted values on the original dataset. Denote these fitted values with $f^t(\\bx_{n})$ for $n = 1, 2, \\dots, N$.\n",
    "   - Calculate the observation error $L^t_{n}$ for $n = 1, 2, \\dots, N$:\n",
    "     \n",
    "   $$\n",
    "   \\begin{aligned}\n",
    "   D^t &= \\underset{n}{\\text{max}} \\{ |y_{n} - f^t(\\bx_{n})|  \\} \\\\\n",
    "   L^t_{n} &= \\frac{|y_{n} - f^t(\\bx_{n})|}{D^t}\n",
    "   \\end{aligned}\n",
    "   $$\n",
    "\n",
    "   - Calculate the model error $\\bar{L}^t$:\n",
    "     \n",
    "     $$\n",
    "     \\bar{L}^t = \\sum_{n = 1}^N  L^t_n w^t_n\n",
    "     $$\n",
    "\n",
    "     If $\\bar{L}^t \\geq 0.5$, end iteration and set $T$ equal to $t - 1$.\n",
    "\n",
    "   - Let $\\beta^t = \\frac{\\bar{L}^t}{1- \\bar{L}^t}$. The lower $\\beta^t$, the greater our confidence in the model. \n",
    "\n",
    "   - Let $Z^t = \\sum_{n = 1}^N w^t_n (\\beta^t)^{1 - L_n}$ and update the model weights with \n",
    "     \n",
    "     $$\n",
    "     w^{t + 1}_n = \\frac{w^t_n (\\beta^t)^{1 - L_n}}{Z^t},\n",
    "     $$\n",
    "\n",
    "     which increases the weight for observations with a greater error $L^t_n$.\n",
    "\n",
    "3. Set the overall fitted value for observation $n$ equal to the weighted median of $f^t(\\bx_n)$ for $t = 1, 2, \\dots, T$ using weights $\\log(1/\\beta^t)$ for model $t$.\n",
    "______"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "class AdaBoostR2:\n",
    "    \n",
    "    def fit(self, X_train, y_train, T = 100, stub_depth = 1, random_state = None):\n",
    "        \n",
    "        self.y_train = y_train\n",
    "        self.X_train = X_train\n",
    "        self.T = T\n",
    "        self.stub_depth = stub_depth\n",
    "        self.N, self.D = X_train.shape\n",
    "        self.weights = np.repeat(1/self.N, self.N)\n",
    "        np.random.seed(random_state)\n",
    "        \n",
    "        self.trees = []    \n",
    "        self.fitted_values = np.empty((self.N, self.T))\n",
    "        self.betas = []\n",
    "        for t in range(self.T):\n",
    "            \n",
    "            ## Draw sample, fit tree, get predictions\n",
    "            bootstrap_indices = np.random.choice(np.arange(self.N), size = self.N, replace = True, p = self.weights)\n",
    "            bootstrap_X = self.X_train[bootstrap_indices]\n",
    "            bootstrap_y = self.y_train[bootstrap_indices]\n",
    "            tree = rt.DecisionTreeRegressor()\n",
    "            tree.fit(bootstrap_X, bootstrap_y, max_depth = stub_depth)\n",
    "            self.trees.append(tree)\n",
    "            yhat = tree.predict(X_train)\n",
    "            self.fitted_values[:,t] = yhat\n",
    "            \n",
    "            ## Calculate observation errors\n",
    "            abs_errors_t = np.abs(self.y_train - yhat)\n",
    "            D_t = np.max(abs_errors_t)\n",
    "            L_ts = abs_errors_t/D_t\n",
    "            \n",
    "            ## Calculate model error (and possibly break)\n",
    "            Lbar_t = np.sum(self.weights*L_ts)\n",
    "            if Lbar_t >= 0.5:\n",
    "                self.T = t - 1\n",
    "                self.fitted_values = self.fitted_values[:,:t-1]\n",
    "                self.trees = self.trees[:t-1]\n",
    "                break\n",
    "            \n",
    "            ## Calculate and record beta \n",
    "            beta_t = Lbar_t/(1 - Lbar_t)\n",
    "            self.betas.append(beta_t)\n",
    "            \n",
    "            ## Reweight\n",
    "            Z_t = np.sum(self.weights*beta_t**(1-L_ts))\n",
    "            self.weights *= beta_t**(1-L_ts)/Z_t\n",
    "            \n",
    "        ## Get median \n",
    "        self.model_weights = np.log(1/np.array(self.betas))\n",
    "        self.y_train_hat = np.array([weighted_median(self.fitted_values[n], self.model_weights) for n in range(self.N)])\n",
    "        \n",
    "    def predict(self, X_test):\n",
    "        N_test = len(X_test)\n",
    "        fitted_values = np.empty((N_test, self.T))\n",
    "        for t, tree in enumerate(self.trees):\n",
    "            fitted_values[:,t] = tree.predict(X_test)\n",
    "        return np.array([weighted_median(fitted_values[n], self.model_weights) for n in range(N_test)]) \n",
    "        \n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, we fit our booster by providing training data in addition to `T`—the number of weak learners—and `stub_depth`—the depth for our regression tree weak learners."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbYAAAFPCAYAAAAhlOuhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3de3xdZZ3v8e8vSVPSUmgsAWlD4XiZjsBhRKPCiaMgzsgoo3bQ0VGpOiPIqaOOgwPKORw5KueIepzRo4igAhVEEawzL45clIuXMuAEisqtKte2XJrWBNI0Nk327/yxnh1WdrOTnWTvvdZ+9uf9euWV7Ntav7X2zvru51nPWsvcXQAAxKIl6wIAAKgmgg0AEBWCDQAQFYINABAVgg0AEBWCDQAQFYItAma20sx2mllrDedxqZl9ulbTnysze8TMXpt1HbNhZrea2fsabdrTzHO1mW0On8Gj6znvMP/jzGxLveeL/CLYGkjYiI+EDUjxZ7m7P+bu+7r7eHjeXhs3M3Mze0E2lc+NmS01s6+a2ZNmtsvMfm1m7826rloxs78J77GV3N9mZtvM7KSsapvB5yX9ffgMbqzWRMOXqTEzW17Fab7HzMZT/z8Pmdl/rdb0y8xzr+A1s3PNbE+oYdDMbjOzY1OPH2NmPzKz35tZv5l9z8wOrmWdMSHYGs9fhg1I8efxrAuqBTNrl/RjSYdKOlbS/pL+SdJnzOwf61xLW51mtV7SUkmvLrn/REku6fo61TFbh0q6dy4vLNfLYGaLJZ0s6WlJ75x7aVP69+L/j6S3SPpsFi1NSd8NNRwg6RZJ30s91inpIkmHKVm/Q5IuqXeBjYpgi4CZHRZaZG1mdp6kP5X05fBt8Mtm9tPw1F+G+94WXneSmd2d+sZ4VGqaR5vZXWY2ZGbflbRPmXkvDK8/MnVfV2hZHmhmB5jZteE5vzezn5lZJZ+7UyStlPRWd3/Y3fe4+/WSPiTpk2a2X+q5LzOz+8xswMwuMbN9Qh1l521my83smvBt+GEz+1Cq/nPN7Gozu9zMnpF0dlie55Ssn+1mtiDc/lszuz/UcIOZHZp67p+Z2QNm9rSZfVnSpBZZkbv/QdJVktaUPLRG0hXuPmZmnWGZ+sO8rjWz7qmmF5bj8tTtic9JuL2/mX3DzJ4ws61m9uli0JjZC8zsJ6Hm7eEzUDr9hWa2U1Krks/Wg+H+F1nSazBoZvea2RtTr7nUklb4D81sWNLxU9WuJNQGJX1S0rtL5tsRpjNgZvdJelnJ4x8zswfDZ/c+M1tdZh5y97sk3S/pRanXvzHUPRiWI/3YdMv2+jC/obA+PxoC+jpJyy3Vy1JSw5ikKyStMLOucN917v49d3/G3XdJ+rKk3nLLgRLuzk+D/Eh6RNJrp7j/MCXf6NvC7Vslva/kOS7pBanbL5G0TdIrlGyY3h2mv1BSu6RHJX1E0gIl32r3SPp0mbq+Kem81O0PSLo+/P2/JV0YprNASehaBcv6HUmXTXF/m6QxSa9LrZN7JB0i6TmSNhTrLDdvJV/o7pT0P8KyPk/SQ6lpnhuW983huR2SbpZ0aqqOz0m6MPz9Zkm/U7JxbJP03yXdFh47QNIzYR0uCOt0rPT9SU23Nzy/I9zeX9KIpBeH28uUbPQXSVqi5Fv+D1Kvn3jvw3JcPs3n5AeSviZpsaQDJf1C0vvDY1dK+m9h+feR9Mpp3quJz1ZYxt9JOjus29coaW2sCo9fqqQV1lucdplp3iTps5IOCuvrJanHPiPpZ+H9PiS8/1tSj79V0vIw/bdJGpZ0cHjsPZJ+nnruy5QE6B+F238Unv9nYVnODMvTXsGyPSHpT8PfncWaJR2Xrq/0vQnT+oyk7cX3Zor18Q+Sbs96G9QoP5kXwM8s3qxkI74z/CMOFjdoU2ywJjZuqdeWBttXJX2q5DmblHSDvUrS40oFkKTbVD7YXivpodTtDZLWhL8/Kelf0/OucFl/LOkzZR57UtI7U+vk9NRjr5f04HTzVhLmj5Xc93FJl4S/z5X005LH3yfp5vC3Sdos6VXh9nWS/i713BZJu5R0Ia1Jb5DCa7eUvj8l8/qtpHeEv0+V9MtpnvtiSQOp2xPvvaYJNiWBsVshQMPjfyPplvD3OiVdYd0VvFfpYPvT8P60pB6/UtK54e9LJa2bYXorJRX0bJjfIOmLqccfknRi6vZpKgmOkundLelN4e/3KAnKQSX/Sy7p/yp81iWdI+mqkvdyq5JwmmnZHpP0fkn7lcz/uNL6wnszGuoYl7RD0nFl6j9K0u8VQpOfmX/oimw8b3b3peHnzfOYzqGSzghdKoNmNqjk2+/y8LPVw39V8Og007pZUoeZvSJ0wb1Yyf4iKWnZ/E7SjZbsqP9YhfVtl7TXzvLQjXZAeLxoc0mdxa6ecvM+VEnXUHrZz1aysZ9qmpJ0taRjQzfSq5RsEH+Wmt4XU9P6vZIAWxFqmZhWWKel0y61Ts92R54i6bLU8i8ys6+Z2aOhm/Snkpba7EfEHqqkBfJEqu6vKWm5SUlLxST9InS5/W2F010uabO7F1L3PapkXRTNtPynSLrf3e8Ot6+Q9I5it69K1qlKPptmtsae7WIflHSkks9M0e3h/2dfSc+VdISk/5Wa9sT0wnJsVuq9nGbZTlbyxerR0I17rKZ3lbsvVfK5u0fSS0ufYMmAr+skfdjdf1b6OKZGsMWpkks2bFbSfbg09bPI3a9U0qWywmzS6LyVZWeW/KNfpeQb/zskXevuQ+GxIXc/w92fJ+kvJf2jmZ1QQX0/lvQXYR9F2slKWhq3p+47pKTOx2eY92ZJD5cs+xJ3f316sUqWcVDSjZL+Oizjlang36ykCy89vQ53v03JupyoL6zTdL1TWSfphLBhPEbSt1OPnSFplaRXuPt+SkJWmnq/3bCSLsui56b+3qxkPR6Qqnk/dz8iLO+T7n6quy9X0gq5wCobVfu4pENs8n7UlUpaPUUzfT7XSHqeJaNhn5T0BSXB9Bfh8UnrVKnPZvhidbGkv5e0LATHPSq/X/MpSdco+XwU60/vHy2+X1tnWjZ3/w93f5OSLwc/UPI/MePyuvt2Jev4XEuNfAzL8mMlPSvfmm4amIxgi9NTSvYbTXffxZJOD60sM7PFZvYGM1si6d+VdNd8yJIBKX8l6eUzzPPbSvZnvFOpDbElA1ReEDYQzyjpdhmvYBm+paTL7nth0MMCM3udpC8p6fp5OvXcD5hZtyWDO86W9N0Z5v0LSc+Y2VlhIEKrmR1pZpMGIZRZxjVKwjUdNhdK+riZHRHmu7+ZvTU89v8kHWFmfxVamx/S5IDZi7s/KunnSrq5fuTuT6YeXqJkn9tgWN5PTDOpuyW9ypLjHPdX0t1anMcTSoL6/5jZfmbWYmbPN7NXh2V4qz07KGVAyca5kvftDiWBemZ4z45TEhrfqeC1CmH+fCWftxeHnyOVrO/iIJKrlKzvzlDjB1OTWBxq7Q/Te294fbn5LZO0Ws+O6rxK0hvM7ITQQjxDyReA26ZbNjNrN7N3mtn+7r5Hz37epOR/b1l4D6bk7g8o6XI9M9S1QklPyFfc/cJpVhmmknVfKD+V/6jywSPHSvqNkg3Sl8J9pyv5pjso6a/DfSdK+o9w3xNKBiIsCY/1SNqoZOf4d8PPlPvYUnX8Tkk3XHvqvo+EuoeVBNU5qceuk3T2NNN7jpLusaeUbMzv1d77Dh9RssG+LyzHZZIWVTDv5UqC48mwnm4vrluV7JtKvaYjrI97p3jsFEm/VrJB2yzpm6nHTgzvx9NKRrf9pHQ5ppjee8J7+raS+5cr2Y+2M0zz/Zpm/6qkr4T18jsl++vSz91fyb7WLaG2jZLeHh77rJKWyE5JD0o6bZpaS/ffHhGW8enwvqxOPXbpdJ8jJV8Srpni/pcrCZjnKGmFrgvLdZ+Sw0DSg0fOC5/D7UpaexPrO6zX8bBcO5UMoLpS0oGp168O0306vPaImZZNyQCQ68Nn6Rkl/1evTL3um0r2ow2G93Cvz5iSfb/DSlp8nwjrdWf6J+ttUKP8FHeYAgAQBboiAQBRIdgAAFEh2AAAUSHYAABRIdgAAFGp11nL5+XEE0/066/P64nNAQAZmPKge6lBWmzbt2+f+UkAAKhBgg0AgEplEmyWXBn5akuuUXV/BScLBQCgIlntY/uikut1vcWSKyUvmukFAABUou7BZsmVj1+l5JxtcvdRJdclAgBg3rLoinyekjNvX2JmG83s61NcmgQAgDnJItjaJL1E0lfd/WglZ7Pe6+KTZnaamfWZWV9/f3+9awQANKgsgm2LkktM3BFuX60k6CZx94vcvcfde7q6uupaIACgcdU92Dy5aOJmM1sV7jpByXWNAACYt6xGRX5Q0hVhRORDkt6bUR1ATRQKrh3DoxodG1d7W6uWLW5XS0vZEyUAqKJMgs3d71ZyhWYg92YbUoWCa9NTQzp1XZ+2DIyou7NDF6/p0aqDlhBuQB1w5hFgGsWQWn3BBvWef4tWX7BBm54aUqFQ/srzO4ZHJ0JNkrYMjOjUdX3aMcxRLUA9EGzANOYSUqNj4xPPL9oyMKLRsfGa1gogQbAB05hLSLW3taq7s2PSfd2dHWpva61JjQAmI9iAacwlpJYtbtfFa3omXlfcx7ZscXtNawWQMPfy+wryoqenx/v6+rIuA01orgNBGBUJ1FzZfyiCDZgBIQXkUtl/woa4gjaQpZYWU9eShVmXAaBC7GMDAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYAMARKUti5ma2SOShiSNSxpz954s6gAAxCeTYAuOd/ftGc4fyESh4NoxPKrRsXG1t7Vq2eJ2tbRY1mUB0cgy2ICmUyi4Nj01pFPX9WnLwIi6Ozt08ZoerTpoCeEGVElW+9hc0o1mdqeZnZZRDUDd7RgenQg1SdoyMKJT1/Vpx/BoxpUB8ciqxdbr7o+b2YGSfmRmD7j7T9NPCIF3miStXLkyixqBqhsdG58ItaItAyMaHRvPqCIgPpm02Nz98fB7m6T1kl4+xXMucvced+/p6uqqd4lATbS3taq7s2PSfd2dHWpva82oIiA+dQ82M1tsZkuKf0v6c0n31LsOIAvLFrfr4jU9E+FW3Me2bHF7xpUB8ciiK/IgSevNrDj/b7v79RnUAdRdS4tp1UFLtH5tL6MigRqpe7C5+0OS/qTe8wVmUq9h+C0tpq4lC6s+XQAJhvsDYhg+UGv1PH6TU2oBYhg+UEvFL46rL9ig3vNv0eoLNmjTU0MqFLwm8yPYADEMH6ilen9xJNgAVX8YfqHg6h/ara0Du9Q/tLtm30yBRlDvL44EG6DqDsOvd7cLkHf1Pn7T3PP/z9bT0+N9fX1Zl4HIVWvndv/Qbq2+YMOkb6jdnR1av7aX0ZBoSjUanFX2hYyKBIJqDcNnfx0wWb2P3yTYgCordruUttg4bRaaWT2P32QfG1BlnDYLyBYtNqDKOG0WkC2CDagBTpsFZIeuSABAVAg2AEBUCDYAQFQINgBAVAg2AEBUCDYAQFQINgBAVAg2AEBUOEAbqLFqXTWg1tMEYkGwATVUi8t11OgSIEA06IoEamjH8OhEAEnJ5WtOXdenHcOjuZomEBOCDaihWlybjeu9AdMj2IAaKl6bLW2+12arxTSBmBBsQA3V4tpsXO8NmJ65e9Y1zKinp8f7+vqyLgOYE0ZFAjVR9gPPqEigxmpxbTau9waUR1ckACAqBBsAICoEGwAgKgQbACAqBBsAICoEGwAgKgQbACAqBBsAICoEGwAgKgQbACAqBBsAICoEGwAgKgQbACAqBBsAICoEGwAgKgQbACAqBBsAICoEGwAgKgQbACAqBBsAICoEGwAgKpkFm5m1mtlGM7s2qxoAAPHJssX2YUn3Zzh/AECEMgk2M+uW9AZJX89i/gCAeGXVYvsXSWdKKmQ0fwBApOoebGZ2kqRt7n7nDM87zcz6zKyvv7+/TtUBABpdFi22XklvNLNHJH1H0mvM7PLSJ7n7Re7e4+49XV1d9a4RANCg6h5s7v5xd+9298MkvV3Sze7+rnrXAQCIE8exAQCi0pblzN39Vkm3ZlkDACAutNgAAFEh2AAAUSHYAABRIdgAAFEh2AAAUcl0VCTyr1Bw7Rge1ejYuNrbWrVscbtaWizrsgCgLIINZRUKrk1PDenUdX3aMjCi7s4OXbymR6sOWkK4AcgtuiJR1o7h0YlQk6QtAyM6dV2fdgyPVm0ehYKrf2i3tg7sUv/QbhUKXrVpA2hOs2qxmdmPJZ3h7r+sUT3IkdGx8YlQK9oyMKLRsfGqTJ8WIYBamLbFZmaHl5yg+ExJ/2xml5jZwbUtDVlrb2tVd2fHpPu6OzvU3tZa8TSma5HVo0VYSR21QEsUyM5MLbabJB1bvOHudyk5G//Jkq43s+9L+qy7j5SbABrXssXtunhNz14tqmWL2yt6/Uwtslq3CCuto9poiQLZmmkf259LOi99h5mZpE2Svirpg5J+a2an1KY8ZKmlxbTqoCVav7ZXG846XuvX9s5q4zxTi6waLcJq1FFt9Z4fgMmmDTZ3/7W7v7N428x+LmmrpH+WtELSeyQdJ+nlZnZR7cpEVlpaTF1LFmpF5yJ1LVk4qxZHaYvs6EOW6pyTDteu0TH1D+1WZ8cCXbymZyLcZtsinGsdUm1ahlnND8Bksx3uf7qke929dIfBB83s/irVhEgUW2RbBkZ09CFL9dHXrdJZ1/xqUvfcC7v21fq1vTU9Ti5dR1EtWoZZzQ/AZLMa7u/u90wRakVvqEI9TSfmQQbFfXTdnR06/bjnT4Sa9Gz33MDInjm3COdSh1S7lmFW8wMwmZXPqfzo6enxvr6+rMuoumYYZFA8c8mu0TG9+nO37vX4hrOO14rORXWro15nUOGMLUDNlf2H4gDtDDXDIIPiPrpF7W11GSgyUx21bBlmOT8AzyLYMtRogwzm021K9xyAeuFckRlqpEEG8+02TR86QPccgFqixZahRmrFVKPblO45APVAiy1DjdCKSQ/+aKRuUwDNixZbxvLciil2P66+YIMeeHIo08EfAFApgg1lpbsfL7z1QZ1/8lEN0W0KoLnRFYmy0qM2N24e1Odv2KRzTjpcL3ruEnW0t+Wu2xQAJFpsmEbpSYo3bh7Up669Tx3tbbnrNgWAIoINZTXSqE0AKKIrEmU1wqhNAChFsGUs7+cULI7aBIBGQbBlqBlOggwA9cY+tgw1w0mQAaDeaLFlqBFOgpz3rtLZiGlZAJRHsGUo7ydBjqmrNKZlSSOsgb3RFZmhvA+nj6mrNKZlKUqf8qz3/Fu0+oIN2vTUUFRXYQfmghZbhvI+nL4RukorFdOyFJUL6/VrexnJiqZGiy1jeT4JcumZR6R8dZXORkzLUhRjWAPVQLChrLx3lc5GTMtSFGNYA9Vg7vnvj+/p6fG+vr6sy2hKMQ1OiGlZpHgHxAAVKvshJ9jQ8PIYWPWqKY/LDtRJ2Q960w8eYcPQ2PLYaqlnTZzyDNhbU+9jy8Nw6ULB1T+0W1sHdql/aHfUQ7Vrsax5HMafx5pqrZk+x8i/pm6xZT1cOo+tjVqp1bLmcWRgHmuqpWb6HKMxNHWLLesNUDN9s6/VsuZxZGAea6qlZvocozE0dbBlvQHKOljrqVbLmsdh/HmsqZaa6XOMxtDUXZHFDVBpF0q9NkB5P1dkNdVqWfN49pY81lRLzfQ5RmNo+uH+WY6KbKZ9E820rM2G9xYZ4Ti2vGqmww2aaVmbDe8tMsBxbHnVTMchlS5rcYg4G8PG10yfY+QfwYZM0H0FoFbqPirSzPYxs1+Y2S/N7F4z+5/1rgHTm+/BtpW8niHiAGolixbbbkmvcfedZrZA0s/N7Dp3vz2DWqIz330d821JVfr6LIaIsx8IaA51b7F5Yme4uSD85H8ESwOoxinC5tuSqvT19T6GMA+nTwNQH5kcoG1mrWZ2t6Rtkn7k7ndkUUdsqtG9N9+WVKWvr/dBzHR9As0jk8Ej7j4u6cVmtlTSejM70t3vST/HzE6TdJokrVy5MoMqG0uh4BrZMzbv7r35Hmxb6evrfRAzZ8cAmkemp9Ry90FJt0o6cYrHLnL3Hnfv6erqqnttjaTYzfbgtuF5d+/NtyU1m9cXh4iv6FykriULa7q/K+vTpwGon7ofoG1mXZL2uPugmXVIulHS+e5+bbnXxHyAdjX0D+3W6gs2qGvfhfro61bprGt+Na8h9NUYgJK3QRocXgBEJ1cHaB8s6TIza1XSYrxqulDDzIrdbFsGRvT5GzbpnJMO19KOBeru7NDB+3fMesM934Nt83iwbrOdvxFoZnUPNnf/laSj6z3fmKX3a23cPKj3f+tOdXd2aP3aXjbcKXkMXADV19SXrYlFs10mBQCmwym1IkA3GwA8i2CLBN1sAJAg2DKWxxGEANDICLYMMQQdAKqPwSMZ4jRPAFB9TdFiy2t3H6d5AoDqiz7Y8tzdN9/zMuZJXr88AGg+0XdF5rm7rxbHn833IqFzmV6ll4Spdm0AMJXoW2x57u6r9vFn1W6dVjq9cl8e1q/tnTgEIc8tZwBxib7Flvezus/lDPflWj7Vbp1WOr1KvjzkueUMIC7RB1tsp5uartuv2q3TSqdXyZeHPLecAcQl+mBLd/dtOOt4rV/bm+vur5n2Q03X8ql267TS6VXy5SHvLWcA8aj79djmolmux1bJfqitA7vUe/4te712w1nH6+D9OzLZx1Z87nSjItnHBqDKym44CLYcKV4wtHT4f3oQxkzPqfaw+2pOj0MCAFRR2Y1H9F2RtVbNIeyV7IeaqdtvLoNR6iXPtQGIR/TD/WupGt1r6VaMmc14wHY9L1FD9yGARkSLbR7mO4S9dITjuf92jy5810tnHME5Xcunmi1IhugDaES02OZhvkPYtw/vnhQcN963TZJ01fuPlbvPujVW7RYWQ/QBNCJabPMw3yHsf9izd3DceN82jRV8Tvuhqt3CKnaNpnV3dsiMbkgA+UWwzcN8D/5uLRMcrXPMjWq3sFpNOv/koyYt3/knHzXn+gCgHuiKnIf5DuToaG/V595ylP7p6l9NdB1+7i1HqaN9fgdUV+tqAS0tLbrstod1zkmHa2nHAg2O7NFltz2s81YfNafpAUA9cBxbhgoF1yM7hvXojl1a1N6qXaPjOnTZIh22bHHND6jOYnoAUEUcoJ1XeT6guhbTA4AqKbshoisyY8Wh+80yPQCoNQaPAACiQrABAKJCsAEAokKwAQCiQrABAKJCsAEAokKwAQCiQrABAKJCsAEAokKwAQCiQrABAKJCsAEAokKwAQCiQrABAKLCZWuqqPTaZZ0dCzQwsmfaa5mNjRW0bedu7RkvaEFriw7cd6Ha2qr3fYPrqQFoNgRblUx1tekL3/VSfemm3+jG+7ZNefXpsbGCHnhqSKdffuek1/zxQUuqEm5cARtAM6Irskp2DI9OBIgkbRkY0emX36mTX3rIxO1T1/Vpx/DoxGu27dw9EWrp12zbubtmNZXWAACxIdiqZHRsfCJAirYMjGhpx4JJt0fHxidu7xkvTPmasfFCTWtK1wAAsSHYqqS9rVXdnR2T7uvu7NDgyJ5Jt9vbWiduL2htmfI1ba3VeVvK1ZSuAQBiQ7BVybLF7bp4Tc9EkBT3l11z5+aJ2xev6dGyxe0Trzlw34W68F0v3es1B+67sGY1ldYAALExd8+6hhn19PR4X19f1mXMaD6jIsfGC2pjVCQAVKrshoxRkXNQLixaWkxdSya3tpYtbp947o7h0b2Cpa2tRcuXdpTOomqmqgkAYkawzdJshtAz3B4A6o99bLNU6RD6QsH15DN/0PDuMZ1z0uE6+pClcxpuXyi4+od2a+vALvUP7VahkP+uYwDIUt1bbGZ2iKR1kp4rqSDpInf/Yr3rmKtKhtBP1VI7/+Sj9PkbNmnj5sGKh9vT4gOA2cuixTYm6Qx3f5GkYyR9wMwOz6COOalkCP1UrbqzrvmVTj/u+bMabs8B1gAwe3UPNnd/wt3vCn8PSbpf0op61zFXnR0Lphyi35k6ELtcq644/L7S4fYcYA0As5fp4BEzO0zS0ZLumOKx0ySdJkkrV66sa13TGRjZoy/d9Budc9LhWtqxQIPh9nmrj5oYfVhs1aVDqbuzQ8uXdui5++1TcTdiuelwgDUAlJfZcWxmtq+kn0g6z92/P91z83Qc29aBXeo9/5a97t9w1vFa0blIUvX2jbGPDQDKKrsRzCTYzGyBpGsl3eDuX5jp+XkItuKxayN7xvTgtmF96abfauPmQUlJK2r92t5Jx4tV68BoDrAGgCnl5wBtMzNJ35B0fyWhlgdTtZw+95aj9NnrN6l/5+4p95tV68BoDrAGgNmpe4vNzF4p6WeSfq1kuL8kne3uPyz3mqxbbP1Du7X6gg177ev67mnH0IoCgGzkp8Xm7j/XNAXlUbnRiZJoTQFAznDmkQpw+RcAaBwEWwVqefkXTpkFANXFSZArtLCtRZ9605Fa1N6qXaPjWliFS8swnB8Aqo9gq8CO4VGt+eYv9ho8UjrEfy7TneqUWfOdLgA0M7oiK1CrU1txyiwAqD6CrQK1GjzCoBQAqD6CrQK1GjxSy0EpANCsMjtX5GxkfYC2VLtTW3HKLACYk/wcoN2oanVqK06ZBQDVRVckACAqBBsAICoEGwAgKgQbACAqBBsAICoEGwAgKk0x3J9jxQCgeUQfbJxBHwCaS/RdkeXOoL9jeDTjygAAtRB9sHEGfQBoLtEHG2fQB4DmEn2wcQZ9AGgu0Q8eaWkxrTpoidav7WVUJAA0geiDTeIM+gDQTKLvigQANBeCDQAQFYINABAVgg0AEBWCDQAQFYINABAVgg0AEBWCDQAQFXP3rPM63U8AAAPVSURBVGuYkZn1S3o06zpq7ABJ27MuokGx7uaOdTd3rLu5qdZ62+7uJ071QEMEWzMwsz5378m6jkbEups71t3cse7mph7rja5IAEBUCDYAQFQItvy4KOsCGhjrbu5Yd3PHupubmq839rEBAKJCiw0AEBWCLWNmdoiZ3WJm95vZvWb24axraiRm1mpmG83s2qxraSRmttTMrjazB8Jn79isa2oUZvaR8L96j5ldaWb7ZF1TXpnZN81sm5ndk7rvOWb2IzP7bfjdWe35EmzZG5N0hru/SNIxkj5gZodnXFMj+bCk+7MuogF9UdL17v7Hkv5ErMOKmNkKSR+S1OPuR0pqlfT2bKvKtUsllR5r9jFJN7n7CyXdFG5XFcGWMXd/wt3vCn8PKdnArMi2qsZgZt2S3iDp61nX0kjMbD9Jr5L0DUly91F3H8y2qobSJqnDzNokLZL0eMb15Ja7/1TS70vufpOky8Lfl0l6c7XnS7DliJkdJuloSXdkW0nD+BdJZ0oqZF1Ig3mepH5Jl4Ru3K+b2eKsi2oE7r5V0uclPSbpCUlPu/uN2VbVcA5y9yek5Iu9pAOrPQOCLSfMbF9J10j6B3d/Jut68s7MTpK0zd3vzLqWBtQm6SWSvuruR0saVg26g2IU9ge9SdJ/krRc0mIze1e2VaEUwZYDZrZASahd4e7fz7qeBtEr6Y1m9oik70h6jZldnm1JDWOLpC3uXuwZuFpJ0GFmr5X0sLv3u/seSd+X9F8yrqnRPGVmB0tS+L2t2jMg2DJmZqZkX8f97v6FrOtpFO7+cXfvdvfDlOy8v9nd+eZcAXd/UtJmM1sV7jpB0n0ZltRIHpN0jJktCv+7J4iBN7P1b5LeHf5+t6R/rfYM2qo9Qcxar6RTJP3azO4O953t7j/MsCbE74OSrjCzdkkPSXpvxvU0BHe/w8yulnSXkhHNG8UZSMoysyslHSfpADPbIukTkj4j6Soz+zslXxTeWvX5cuYRAEBM6IoEAESFYAMARIVgAwBEhWADAESFYAMARIVgAwBEhWADAESFYANyzsz+s5ltSN1+iZndnGVNQJ5xgDaQc2bWouTSKCvcfdzMblFyDb+7Mi4NyCVOqQXknLsXzOxeSUeY2QslPUaoAeURbEBjuF3JeUXXau8rEgNIIdiAxnC7pEslfSVc7BJAGexjAxpA6IL8iaQXuvtw1vUAecaoSKAxfFjSxwk1YGYEG5BjZvZ8M3tAUoe7X5Z1PUAjoCsSABAVWmwAgKgQbACAqBBsAICoEGwAgKgQbACAqBBsAICoEGwAgKgQbACAqPx/JQ4GHfQdST4AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 504x360 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "booster = AdaBoostR2()\n",
    "booster.fit(X_train, y_train, T = 50, stub_depth = 4, random_state = 123)\n",
    "\n",
    "fig, ax = plt.subplots(figsize = (7,5))\n",
    "sns.scatterplot(y_test, booster.predict(X_test));\n",
    "ax.set(xlabel = r'$y$', ylabel = r'$\\hat{y}$', title = 'Fitted vs. Observed Values for AdaBoostR2')\n",
    "sns.despine()"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Edit Metadata",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
